{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 文本转向量介绍\n",
    "\n",
    "\n",
    "#### 如何使用bert模型将文本转换成向量的\n",
    "1. 主要是使用hidden_state值、attention_mask 来做处理的。\n",
    "2. 基本的参考链接可以看这里 https://github.com/UKPLab/sentence-transformers/blob/06f5c4e9857f013da2657d43a77d9f5f0bf50a61/sentence_transformers/models/Pooling.py#L128\n",
    "3. 最常见的，或者最方便的，就是提取cls对应的hidden_status的值。当然，还有别的，这里先不展开介绍。\n",
    "\n",
    "\n",
    "#### 标准化问题\n",
    "1. 为什么有的时候是使用cos相似度，有的时候，就是直接用矩阵乘法\n",
    "2. 实际上是等效的：\n",
    "- 2.1 如果模型输出的时候，已经做过标准化了，那就直接使用矩阵乘法就行了。\n",
    "- 2.2 如果模型输出的时候，没有做标准化，那就使用cos相似度\n",
    "\n",
    "3. 做标准化部分，就相当于提前做了$A / ||A||$操作\n",
    "\n",
    "![](images/cos.png)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 如何使用文本转向量模型\n",
    "### 使用transformers方式\n",
    "1. 建议大家使用这个方式，而不是直接使用sentence-transformers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import AutoTokenizer, AutoModel\n",
    "import torch\n",
    "\n",
    "model_name_or_path = \"model/bge-base-zh-v1.5\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Sentence embeddings: tensor([[-0.0157, -0.0291,  0.0919,  ..., -0.0066,  0.0219,  0.0269],\n",
      "        [-0.0168, -0.0304,  0.1028,  ..., -0.0277,  0.0120,  0.0091]])\n"
     ]
    }
   ],
   "source": [
    "# Sentences we want sentence embeddings for\n",
    "sentences = [\"样例数据-1\", \"样例数据-2\"]\n",
    "\n",
    "# Load model from HuggingFace Hub\n",
    "tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)\n",
    "model = AutoModel.from_pretrained(model_name_or_path)\n",
    "model.eval()\n",
    "\n",
    "# Tokenize sentences\n",
    "encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors=\"pt\")\n",
    "# for s2p(short query to long passage) retrieval task, add an instruction to query (not add instruction for passages)\n",
    "# encoded_input = tokenizer([instruction + q for q in queries], padding=True, truncation=True, return_tensors='pt')\n",
    "\n",
    "# Compute token embeddings\n",
    "with torch.no_grad():\n",
    "    model_output = model(**encoded_input)\n",
    "    # Perform pooling. In this case, cls pooling.\n",
    "    sentence_embeddings = model_output[0][:, 0]\n",
    "# normalize embeddings\n",
    "sentence_embeddings = torch.nn.functional.normalize(sentence_embeddings, p=2, dim=1)\n",
    "print(\"Sentence embeddings:\", sentence_embeddings)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## onnx 推理转换\n",
    "\n",
    "### onnx必须使用的环境\n",
    "1. http://www.xavierdupre.fr/app/onnxcustom/helpsphinx/api/onnxruntime_python/helpers.html\n",
    "\n",
    "```BASH\n",
    "pip install \"optimum[onnxruntime]\" # cpu版本\n",
    "pip install \"optimum[onnxruntime-gpu]\" # gpu版本\n",
    "\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['TensorrtExecutionProvider',\n",
      " 'CUDAExecutionProvider',\n",
      " 'AzureExecutionProvider',\n",
      " 'CPUExecutionProvider']\n"
     ]
    }
   ],
   "source": [
    "import pprint\n",
    "import onnxruntime\n",
    "\n",
    "pprint.pprint(onnxruntime.get_available_providers())  #  'CUDAExecutionProvider', 一定要出现这个"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "odict_keys(['last_hidden_state', 'pooler_output'])"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model_output.keys()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "torch.Size([2, 768])"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model_output.last_hidden_state[:,0].shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 将模型进行转换\n",
    "1. sbert = bert + pool\n",
    "2. optimum do not have sbert\n",
    "- 2.1 optimum for bert \n",
    "- 2.2 pool \n",
    "- xxxx"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "The argument `from_transformers` is deprecated, and will be removed in optimum 2.0.  Use `export` instead\n",
      "Framework not specified. Using pt to export the model.\n",
      "Using the export variant default. Available variants are:\n",
      "    - default: The default ONNX variant.\n",
      "Using framework PyTorch: 2.2.0+cu118\n",
      "Overriding 1 configuration item(s)\n",
      "\t- use_cache -> False\n",
      "/home/yuanz/anaconda3/envs/try0303/lib/python3.11/site-packages/optimum/onnxruntime/configuration.py:770: FutureWarning: disable_embed_layer_norm will be deprecated soon, use disable_embed_layer_norm_fusion instead, disable_embed_layer_norm_fusion is set to True.\n",
      "  warnings.warn(\n",
      "Optimizing model...\n",
      "\u001b[0;93m2024-03-08 07:18:20.809915600 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.\u001b[m\n",
      "\u001b[0;93m2024-03-08 07:18:20.809946916 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.\u001b[m\n",
      "Configuration saved in model/bge_base_zh_v1_5_onnx/ort_config.json\n",
      "Optimized model saved at: model/bge_base_zh_v1_5_onnx (external data format: False; saved all tensor to one file: True)\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "PosixPath('model/bge_base_zh_v1_5_onnx')"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from optimum.onnxruntime.configuration import OptimizationConfig\n",
    "from optimum.onnxruntime import ORTOptimizer, ORTModelForFeatureExtraction\n",
    "\n",
    "model_id = \"model/bge-base-zh-v1.5\"\n",
    "onnx_path = \"model/bge_base_zh_v1_5_onnx\"\n",
    "\n",
    "model = ORTModelForFeatureExtraction.from_pretrained(model_id=model_id, from_transformers=True)\n",
    "optimizer = ORTOptimizer.from_pretrained(model)\n",
    "\n",
    "optimizer_config = OptimizationConfig(\n",
    "    optimization_level=2,\n",
    "    optimize_for_gpu=True,\n",
    "    fp16=True\n",
    ")\n",
    "optimizer.optimize(save_dir=onnx_path, optimization_config=optimizer_config)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### onnx 开始推理（测试准确性和效率）"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "from optimum.onnxruntime import ORTModelForFeatureExtraction\n",
    "from transformers import AutoTokenizer,AutoModel\n",
    "import torch.nn.functional as F\n",
    "from tqdm import tqdm\n",
    "from pathlib import Path\n",
    "from typing import List\n",
    "import time\n",
    "import numpy as np \n",
    "import torch"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[0;93m2024-03-08 07:52:08.739857954 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.\u001b[m\n",
      "\u001b[0;93m2024-03-08 07:52:08.739890976 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.\u001b[m\n"
     ]
    }
   ],
   "source": [
    "model_id = \"model/bge-base-zh-v1.5\"\n",
    "onnx_path = \"model/bge_base_zh_v1_5_onnx\"\n",
    "\n",
    "\n",
    "def load_model_raw(model_id):\n",
    "    model_raw = AutoModel.from_pretrained(model_id, device_map=\"cuda:0\")\n",
    "    # model_raw = model_raw.to(torch.float16)\n",
    "    model_raw.eval()\n",
    "    # model_raw.half()\n",
    "    return model_raw\n",
    "\n",
    "\n",
    "def load_model_ort(model_path):\n",
    "    model = ORTModelForFeatureExtraction.from_pretrained(\n",
    "        model_id=model_path,\n",
    "        file_name=\"model_optimized.onnx\",\n",
    "        provider=\"CUDAExecutionProvider\",\n",
    "    )\n",
    "    return model\n",
    "\n",
    "\n",
    "model_raw = load_model_raw(model_id=model_id)\n",
    "model_ort = load_model_ort(model_path=onnx_path)\n",
    "tokenizer = AutoTokenizer.from_pretrained(model_id)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [],
   "source": [
    "sentece1 = [\"啊哈哈哈哈，我爱良睦路程序员\"]\n",
    "\n",
    "tokenizer_output = tokenizer(sentece1, padding=True, truncation=True, return_tensors=\"pt\")\n",
    "# tokenizer_output\n",
    "\n",
    "for k in tokenizer_output.keys():\n",
    "    tokenizer_output[k] = tokenizer_output[k].cuda()\n",
    "\n",
    "raw_o1 = model_raw(**tokenizer_output)\n",
    "ort_o1 = model_ort(**tokenizer_output)\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(odict_keys(['last_hidden_state', 'pooler_output']),\n",
       " odict_keys(['last_hidden_state']))"
      ]
     },
     "execution_count": 59,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "raw_o1.keys(), ort_o1.keys()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 0.35131836,  0.00605774,  0.12158203, -0.6694336 ], dtype=float32)"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ort_o1[\"last_hidden_state\"].detach().cpu().numpy()[0, 0, :4]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 0.35138747,  0.00591746,  0.12244577, -0.66722745], dtype=float32)"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "raw_o1[\"last_hidden_state\"].detach().cpu().numpy()[0, 0, :4]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(False, True)"
      ]
     },
     "execution_count": 71,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# raw_o1['last_hidden_state'].detach()\n",
    "ort_o1[\"last_hidden_state\"].detach()\n",
    "\n",
    "(\n",
    "    np.allclose(\n",
    "        raw_o1[\"last_hidden_state\"].detach().cpu().numpy(),\n",
    "        ort_o1[\"last_hidden_state\"].detach().cpu().numpy(),\n",
    "        atol=1e-3,\n",
    "    ),\n",
    "    np.allclose(\n",
    "        raw_o1[\"last_hidden_state\"].detach().cpu().numpy(),\n",
    "        ort_o1[\"last_hidden_state\"].detach().cpu().numpy(),\n",
    "        atol=1e-1,\n",
    "    ),\n",
    ") # 误差有点大，不要紧"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 测试效率"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "raw infer: 100%|██████████| 100/100 [00:00<00:00, 115.30it/s]\n",
      "ort infer: 100%|██████████| 100/100 [00:00<00:00, 633.66it/s]\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "(0.8699741363525391, 0.15927767753601074)"
      ]
     },
     "execution_count": 65,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s1 = time.time()\n",
    "for i in tqdm(range(100), desc=\"raw infer\"):\n",
    "    raw_o1 = model_raw(**tokenizer_output)\n",
    "\n",
    "s_raw = time.time() - s1\n",
    "\n",
    "\n",
    "s1 = time.time()\n",
    "for i in tqdm(range(100), desc=\"ort infer\"):\n",
    "    ort_o1 = model_ort(**tokenizer_output)\n",
    "\n",
    "s_ort = time.time() - s1\n",
    "\n",
    "(s_raw, s_ort)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 测试准确性"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Sentences we want sentence embeddings for\n",
    "sentences = [\"样例数据-1\", \"样例数据-2\"]\n",
    "\n",
    "\n",
    "def embd_func(model, tokenizer, inputs: List[str], normalize_embedding: bool = True):\n",
    "    # Load model from HuggingFace Hub\n",
    "    # Tokenize sentences\n",
    "    encoded_input = tokenizer(\n",
    "        inputs, padding=True, truncation=True, return_tensors=\"pt\"\n",
    "    )\n",
    "    # for s2p(short query to long passage) retrieval task, add an instruction to query (not add instruction for passages)\n",
    "    # encoded_input = tokenizer([instruction + q for q in queries], padding=True, truncation=True, return_tensors='pt')\n",
    "    for k in encoded_input.keys():\n",
    "        encoded_input[k] = encoded_input[k].cuda()\n",
    "    # Compute token embeddings\n",
    "    with torch.no_grad():\n",
    "        model_output = model(**encoded_input)\n",
    "        # Perform pooling. In this case, cls pooling.\n",
    "        sentence_embeddings = model_output[0][:, 0]\n",
    "    if normalize_embedding:\n",
    "    # normalize embeddings\n",
    "        sentence_embeddings = torch.nn.functional.normalize(sentence_embeddings, p=2, dim=1)\n",
    "\n",
    "\n",
    "    return sentence_embeddings.detach().cpu().numpy()\n",
    "    # print(\"Sentence embeddings:\", sentence_embeddings)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(array([[-0.01574466, -0.0290632 ,  0.09189218, ..., -0.00664417,\n",
       "          0.02189073,  0.02688614],\n",
       "        [-0.01677734, -0.03039891,  0.10282401, ..., -0.02765604,\n",
       "          0.01199661,  0.00911522]], dtype=float32),\n",
       " array([[-0.01573375, -0.029145  ,  0.09198891, ..., -0.00649501,\n",
       "          0.02190429,  0.02698189],\n",
       "        [-0.01686918, -0.03031984,  0.10277913, ..., -0.02763865,\n",
       "          0.01200952,  0.00904345]], dtype=float32))"
      ]
     },
     "execution_count": 67,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sentence_embedding_raw = embd_func(model_raw,tokenizer, sentences,normalize_embedding=True)\n",
    "sentence_embedding_ort = embd_func(model_ort, tokenizer, sentences,normalize_embedding=True)\n",
    "\n",
    "sentence_embedding_raw[:4], sentence_embedding_ort[:4]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 70,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.allclose(sentence_embedding_raw, sentence_embedding_ort, atol=1e-3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 总结\n",
    "1. sbert = bert + pool\n",
    "2. optimum -> sbert not !!\n",
    "3. optimum -> bert\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "pytorch22",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
